arei.net ⇒ java
Writing Code for Other People

Writing code is hard. Getting code to work in a repeatable, consistent fashion that addresses all of the possible use cases and edge cases takes deliberate time and fortitude. Yet getting code working is only half the battle of writing good code. See, good code not only solves the problem, it does so in a manner that other people can use your solution to not just solve a problem, but use it to understand the problem, use it to understand the solution, and use it to understand how to go beyond it. Good code works, but great code teaches.

So, with that in mind, how do we as engineers, write great code? What separates code that merely gets the job done from code that empowers people who read it? How do we transform our code from simply functional to deliberately empowering?

Below I present my ideas on what makes code approachable to others. Following these techniques will make your code more readable by others, more understandable by others, and more reusable by others. To meet these goals requires four specific areas of your attention: readability, context, understandability, and re-usability.

Readability: Write Beautiful Code

The first step in writing great code is to make it readable to others. By the very nature of using a higher order programming language, one would think that readability is solved, but a programming language alone is not enough. A programming language, like any communication system, like any language, is merely a means to express an idea. It is how we use that language and how we structure that usage that makes it beautiful or not, that gives it real understanding.

1. Indentation Conveys Hierarchy

Most modern programming languages do not require indentation. You could simply write all your code like this:

function add(x,y) {
return x + y;
}

and it would compile and execute just fine. But clearly, there is a reason why most modern programming languages do allow for liberal indentation. This code with indentation is infinitely more readable:

function add(x,y) {
    return x + y;
}

We use indentation to indicate hierarchy. This works because of how we visual scan things with our eyes. Western language readers are taught to scan from top to bottom, left to right. Indentation plays into that. As you scan down the code from top to bottom, the left to right indentation allows you to easily ascertain hierarchy within the code.

The return statement in the above code clearly is subordinate to the function definition, merely by that act of indenting it.

In fact, beyond even most computer languages, most descriptive languages like HTML, XML, CSS, JSON, etc, all support indentation to imply hierarchy. One of the hardest things of all when inheriting someone else’s code or data is it not being well formatted. Reformatting and format commands work mostly, but not always, and that is where thing can rapidly fall apart.

2. Meaningful Whitespace

Indentation is not the only way we convey meaning in our code using whitespace. Consider this code:

const FS = require('fs');
const contents = FS.readFileSync(process.argv[2], 'utf8');
let lines = contents.split(/\r\n|\n/);
lines = lines.map(line => {
    line = line.toLowerCase();
    line = line.replace(/[^\sA-Za-z0-9-]/g, '');
    line = line.replace(/\s\s|\t/g, ' ');
    return line;
});
lines = lines.map(line => line.split(/\s/));
const words = lines.reduce((words, line) => {
    line.forEach(word => {
        words[word] = words[word] + 1 || 1;
    });
    return words;
},{});
Object.keys(words).forEach(word => {
    console.log(word + ' ' + words[word]);
});

Like our indentation, this is perfectly executable code that will run and do the several “things” it is intended to do. But understanding what those “things” are is not so clear.

Code runs in an organized top to bottom fashion, and we mostly structure that code into logical steps. The code reads a file, the code breaks the file contents up by newlines, etc. These are the steps that your code performs.

If you look at this code you can even begin to see each of these steps. It might take a minute or two, but the steps are there.

Now, look at this code again:

const FS = require('fs');

const contents = FS.readFileSync(process.argv[2], 'utf8');

let lines = contents.split(/\r\n|\n/);
lines = lines.map(line => {
    line = line.toLowerCase();
    line = line.replace(/[^\sA-Za-z0-9-]/g, '');
    line = line.replace(/\s\s|\t/g, ' ');
    return line;
});
lines = lines.map(line => line.split(/\s/));

const words = lines.reduce((words, line) => {
    line.forEach(word => {
        words[word] = words[word] + 1 || 1;
    });
    return words;
},{});

Object.keys(words).forEach(word => {
    console.log(word + ' ' + words[word]);
});

The logical steps of our code are much clearer to see, much cleaner to read. The only change between the two examples is whitespace.

This process of visually splitting logical steps of code up is called meaningful whitespace. Meaningful whitespace is the art of using well placed line breaks to chunk code into discreet steps. Just like indentation, we use whitespace, new lines, to organize the logic of what our code does for other people to easily understand. And this is about more than just breaking up large chunks of text. Smaller blocks of code are easier to understand when examined than larger ones. Visually when reading this code one can easily understand that each of these steps does something in common and works together to achieve some result.

This is another reason that most programming languages ignore whitespace.

When inheriting someone else’s code it is not uncommon to find a complete lack of meaningful whitespace. The burden this places on the developer to go through the code and chunk it up correctly is significant and slows down their rate of understanding. By grouping your code into chunks that are digestible and understandable to someone else you aid them in time to understanding.

Context and Comments

Up to now we’ve talked about how to make our code more readable. Indentation and Meaningful Whitespace provides readers of your code super easy cues to help the scan and breakdown your code and this makes the code more readable. Readability is the first key to great code.

The next step to great code: context. That is, once my code can be easily read, how do I give it meaning so it can be easily understood. And the keys to understanding code is understanding the context of what any section operates within and on, and how the logic flow of any section operates.

3. Contextual Naming

Fun fact, excluding comments and data, the only things in your code that you get to name are classes and variables and functions. Everything else is a pre-defined keyword or operator of some type. This is why class and variable and function naming is so important, because it is the only part of your actual code where you can provide context to what your code does.

Consider the following code:

function x(y) {
   return y < 1 ? 0
        : y <= 2 ? 1
        : x(y - 1) + x(y - 2);
}

What this code does is not obvious by just reading the code. Providing a contextual name to the function changes everything here:

function fibonacci(y) {
   return y < 1 ? 0
        : y <= 2 ? 1
        : fibonacci(y - 1) + fibonacci(y - 2);
}

Class names, function names, variable names, these are opportunities to describe what they represent (classes), what they do (functions), and what they hold (variables). The name provides context to the role in the system.

Now, there are a few special rules around this, for sure. If iterating a number in a loop, we use i for the index. This is a well-known convention. But even this convention can stand breaking some times. If the context is more important than the convention, always go for more context.

4. Convey Logic through Comments

Class names, function names, and variables names all provide context as to their roles, but program logic is dictated by keywords and in almost all programming languages there is no way to change keywords. So how do we provide context and meaning around program logic?

That is where comments come in, and specifically inline comments within the code. I view this differently than documentation. Documentation answers the bigger question about the code purpose and how people use the code. Comments, instead help to describe the logic and the context around the logic in the code. (We will talk about documentation in greater detail in a little bit.)

Comments within your code are simple useful hints as to how the logic within the following section of code works. Have you ever read a block of code and then wondered to yourself what is going on? That is a failure of commenting. The initial author failed to recognize that another developer would understand the code, and that stems from either hubris or laziness, neither of which are particularly good habits to have as a developer.

Now, this is not an advocation for writing a comment before every line of code. I am a firm believer in writing code that is simple to follow and does not require comments to understand, but that is not always the case. And over-commenting creates a whole separate level of problem. Instead, comment where it is needed. When considering any section of code (which you nicely formatted with whitespace), ask yourself “is it readily obvious what it does?” If the answer is even slightly a no, take a second to add a comment or two of context.

This code from an earlier example is a lot more understandable for a handful of comments, and adding these comments as the logic is written takes almost nothing:

// Bring in the standard filesystem library
const FS = require('fs');

// Read the file
const contents = FS.readFileSync(process.argv[2], 'utf8');

// Split our file content into lines and
// convert each line to an array of simple words
let lines = contents.split(/\r\n|\n/);
lines = lines.map(line => {
    line = line.toLowerCase();
    line = line.trim();
    line = line.replace(/[^\sA-Za-z0-9-]/g, '');
    line = line.replace(/\s\s|\t/g, ' ');
    return line;
});
lines = lines.map(line => line.split(/\s/));

// Count the occurance of each word in all the content.
const words = lines.reduce((words, line) => {
    line.forEach(word => {
        words[word] = words[word] + 1 || 1;
    });
    return words;
},{});

// Output our word cloud and frequency 
Object.keys(words).forEach(word => {
    console.log(word + ' ' + words[word]);
});

There are two different uses of comments. First is what we covered above, to add logic understanding to your code. The second use of comments is to add documentation around how your code is used. This second type is covered in a later section, so please keep reading.

For logic comments, the general rule of thumb is to use // instead of /* */ in your code. /* */ block style comments are really more for providing documentation than doing quick one-off logic comments.

Also, when doing logic comments, do not hide them on the end of a line. Put them on their own line so that they are more visible and easier to see.

const superWeirdThing = !superWeirdThing; // DON'T COMMENT LIKE THIS

Organize Your Code

Understanding about your Code is done partially through providing context about what that code does, but equally important to understanding is organizing and structuring your code in a repeatable, understandable, and accepted fashion.

5. Organized from Top to Bottom

The first step in organizing your code is to physically lay the code out in logical sections. Almost all modern programming languages have a recommended approach to organizing your code. Different languages have different approaches to this, and even different approaches within that. JavaScript, is one of the more chaotic languages with little formal agreement on the best way to do this. However, there are still little things you can do without a formal community accepted standard. For example, import/require statements are expected to be first in the file.

The important thing here is to keep like with like. Do not put two class definitions separated by some variable assignments. Put the class definitions together. This is an implied structural organization for your code and it applies at the file level, the class level, and the function level.

A common layout pattern for a file in many languages is:

  • Require Libraries
  • Declare Constants
  • Declare Variables
  • Declare Classes
  • Declare Functions

A common layout pattern for a class in many languages is:

  • Declare Static Members
  • Declare Members
  • Declare Static Methods
  • Declare Constructor
  • Declare Methods

There is a common layout pattern for writing functions in some languages, but JavaScript is a little more lose here, so I am not going to offer a formal pattern. Instead, I will tell you how I like to structure things for a function:

  • Arguments and Argument Checking at the top
  • Declare variables as I need them, but nearer to the top if possible.

6. Separation of Concerns

In computer science the concept of Separation of Concerns (SOC) allows one to divide a computer program into logical sections based on what each section of that computer program does. In Web Development, HTML, CSS, and JavaScript is an example of Separation of Concerns where HTML is responsible for the content, CSS is responsible for how the content looks, and JavaScript is responsible for how the content behaves. Model-View-Controller (MVC) is yet another Separation of Concerns system that is often used.

But beyond a fancy CS Degree, separation of concerns means creating one thing to do that thing well without side effects. It means writing many functions that each do something well, instead of one single function that handles all the cases. It also makes more sense to break things up into smaller discreet units in terms of reusable code.

It also means letting the various pieces do exactly what they are good at and not trying to force them to do tasks that are better suited elsewhere. It is this reason that we should not write style information in JavaScript when a perfectly good CSS rule will do. CSS is optimized for dealing with these things and JavaScript is not, so why would you try to outwit it? Of course, there are always times when you have to break these rules, but they are very, very rare.

Related to SOC, in Computer Science there are three other principals to be aware of:

  • Coupling – Coupling is the measurement of how interdependent two modules or sections of code are to one another. There are a lot of types of coupling between two sections of code with a lot of fancy names and descriptions. For now it is best to just understand it this way: How dependent on A is B and vice-versa? Can A run without B? Can B run without A? Two modules are said to be Tightly Coupled, if they are very dependent on each other. Conversely, they are said to be Weakly Coupled it they are only slightly dependent on one another. Coupling in code is necessary because code executes linearly. However, designing classes and functions to be independent provides for better re-use and more understandable code. Consider some code you are looking at and that moment when you say, “Well, this calls function X, now where is function X and what does it do?” This is an example of coupling.
  • Cohesion – Is somewhat the opposite of coupling as it measures the degree to which things within a section or module of code belong together. This is usually described as having High Cohesion, meaning things within the module belong together, or Low Cohesion, meaning that they do not. In Object Oriented Programming, high cohesion of what is in any given class is a design goal. The class should do what it sets out to do and only what it sets out to do. If I have a Class called Car and inside that class there’s a function called juggle() it probably means my class has poor cohesion (and interestingly probably also implies tight coupling with something else). Side Effects from running a module or function implies poor cohesion. If function juggle() also fills the car with gasoline, either this is an amazing juggling trick, or there is a side-effect in the code.
  • Cyclomatic Complexity – Cyclomatic Complexity is the measure of how complex a section of code is, based on the number of paths through the code. That is to say, the more non-linear outcomes the code generates, the higher its Cyclomatic Complexity score. This is often judge by the number of conditional statements in your code. A function that has no conditionals, is said to have a Cyclomatic Complexity of 1. If it had a single conditional with two out comes (a typical if/else for example) it would have a Cyclomatic Complexity of 2, and so on. Cyclomatic Complexity is not bad, in and of itself, but the more complex the code, the higher the possibility of failure, and the more testing required. A function should strive for a lower complexity score if possible.

So, we talk about these three measures of code in order to reinforce the point of Separation of Concerns. A Separated module is one that is not coupled directly to another module, is cohesive in what it does, and reduces complexity.

When you find yourself in code that is hard to follow, it is almost always because it fails on one of these three factors.

One way to reduce complexity and structure your code better is to identify the cohesive sections of your code and isolate them into Logical Blocks. A Logical Block is a portion of your code that requires multiple lines to complete a single task. As we talked about earlier, breaking your code up with meaningful whitespace makes these logical blocks easier to identify. Adding comments makes them clearer in their purpose.

But a Logical Block also is a good hint as to how to abstract logic into small, more cohesive behaviors. For example, if your logical block needs to get reused, isolate it out into a new function.

Also, it is worth mentioning about Code Folding. Code Folding is a feature of most modern IDEs that allow you to hide (or fold) blocks of code. Code Folding in your IDE can be a powerful code reading tool. However, Code Folding requires certain syntax structures to work. By thinking of your code as Logical Blocks you can see where Code Folding opportunities could exist, if the logical block was in an acceptable structure.

Thinking about your code as a Logical Block allows you to more easily see other ways to do the same thing, ways that may be cleaner and more cohesive. It illustrates where there may be room for improvement in your code as well.

For example, this code is a Logical Block.

let sum = 0;
for (const num of numbers) {
    sum += num;
}
const avg = sum / numbers.length

You could change this logical block to be more readable by…

  • Abstracting it out to a function, especially if it is going to be reused.
const avg = calculateAvg(numbers);
  • Wrapping it in a IIFE (Immediately Invoked Function Expression)
const avg = (() => {
    let sum = 0;
    for (const num of numbers) {
        sum += num;
    }
    return sum / numbers.length; 
}());
  • Using a built-in library to do it simpler.
const avg = numbers.reduce((sum, num) => sum + num, 0) / numbers.length;
  • Using a third party library helper function.
const sum = _.avg(numbers);

All of these are better than the first because they create an isolated block that cleanly identifies what it does. With proper whitespacing and a comment above the block, makes this tight, readable, well reasoned code.

Code Usability

At some level, everything you see, smell, taste, touch, or interact with is a User Interface. An API, for example, is a User Interface between your backend and your frontend. This goes for the code you write as much as anything else. At some point, another person is going to read your code, run it, and even have to make sense of it. So, build your code like you would an API. Take into consideration how another person is going to interact with that code, how they are going to try and reason about it, how they are going to try and work within it.

Build everything with another user’s experience in mind.

7. Defensive Coding

Software is a piece of code that produces some sort of output, usually based on some sort of input. Now, assume someone is going to run your code at some point. Assume that that person is not going to read the documentation. What would happen? Would it work if the first argument is a null? Would it work if the second argument was a negative zero or a NaN? How your code behaves when executed under uncertain conditions is a testament to the quality of engineering employed in building it.

An approach to handling these questions is called Defensive Programming. In defensive programming you assume that users will maliciously try to cause problems with your code and therefore your code must guard against their attack. This means making the assumption that all data is tainted and must be verified to be correct.

Here’s an example:

if (this.onClose) this.onClose();

The assumptions made by this code are not defensive. The snippet assumes that onClose is a function and then executes it as a function. A corrected version of this code might look like this:

if (this.onClose && this.onClose instanceof Function) this.onClose();

But there is another assumption here as well, that executing this.onClose() will work? So even more defensive code might do it thus:

if (this.onClose && this.onClose instanceof Function) {
    try {
        this.onClose();
    }
    catch (ex) {
        ... do something about the error ...
    }
}

There is even an argument, that another defensive point might be that if this.onClose() is a long running process, it could cause your thread execution to grind to a halt and an even more defensive posture might be thus:

if (this.onClose && this.onClose instanceof Function) {
    setTimeout(()=>{
        try {
            this.onClose();
        }
        catch (ex) {
            ... do something about the error ...
        }
    },0);
}

That last form might not work for all situations, but it is extremely defensive.

The key to writing code defensively, is to checking your inputs early in your code. Make sure what you are getting is what you are expecting. If your function only works with even numbered integers, you better make sure it takes only even numbered integers.

The same is true for classes. Write your classes to assume that users are going to try to harm them and break your code. This means writing defensive methods but also, preventing unwanted or unknown class state mutation. Setters are a great way to protect your class from bad user input and should be used whenever possible instead of exposed public members. Consider this class:

class Car {
    make: 'Chevy';
    model: 'Malibu';
    mpg: 25;
    tankSizeInGallons: 14;

    toString() {
        return `The ${this.make} ${this.model} can go ${this.mpg * this.tankSizeInGallons} miles on a single tank of gas.`;
    }
}

The publicly exposed make, model, mpg, and tankSizeInGallons are all mutable without any thought for defensiveness. A malicious coder could cause quite a problem with these variables. More defensive style would be to use private variables (or Symbols) to hold the values, and provide public getter and setter functions to expose them. The setter functions, in particular, can do error checking and ensure type and values match acceptable inputs.

8. Code is a User Interface

Your code, whether a component, a utility, or an API, is going to get used by someone else. This is even more true when one considers that sharing of code online has become pervasive. And if you work in a team setting at all, this compounds the likelihood someone else will need your code. I do not care how throw away you believe your code is, assume that someone is going to use it.

To that end, and as stated before, treat your code like a User Interface. This means designing your code to be read is important, but also designing your code to be used by others is equally important. Ask yourself how another user is going to come in contact with your code. If they see the function signature, does it provide enough information as to what it does and how it works or does it require additional information? If they read your code, is it obvious how they would step through it from line to line or does it require more contextual logic?

A lot of what we have already covered in this document describes how to address writing code toward usage by others, but it bears repeating here:

  • Design your code to be read by others.
  • Highlight hierarchy with indentation.
  • Separate discreet sections of behavior with whitespace.
  • Name classes, functions, and variables contextually.
  • Provide comments around programming logic.
  • Clearly Separate your application concerns.

Following these steps is the first part of writing code like a user interface.

The second part is to always be asking yourself, how will others use my code? Challenge yourself to always be mindful of this question and your code will be the better for it.

Most professional coding work is a shared environment where everyone is always working on each other’s code; any team experience really is. As such it is critical to always be considerate of how others are going to use your code. Whenever you write a new component you should be consciously aware that someone else may very well need what you are building for their own purposes. Write your component with that reuse in mind. But also, be insatiable in your desire to make your component as accessible to another developer as possible. The more we reuse one another’s components, the better our product becomes.

9. Documentation

Finally, let us talk about documentation. To many software engineers the act of writing documentation is seen as an after thought, something to be farmed out to some loser English Literature person desperate for a job. But the reality of software engineering is that it is only 50% about writing code. The remaining part of Software Engineering is communicating with other people.

As we have stated above: someone else is going to use your code. It is inevitable. Given that, telling those users how to use your code is critical. If you fail to communicate how your code runs, you might as well not have written the code at all.

We talked about comments in a prior section. Comments are used to describe the logic your code follows. Documentation, on the other hand, describes how a user of your code interacts with the code and gets it to do what is expected.

Unlike inline comments to describe logic blocks, documentation is about describing the interfaces of your code, the places where others will interface with your code. We write documentation to tell others whom are using the code what it does and how it is used. This does not mean you document every single function and variable. Instead this means you document every public facing part of your code, every place other will interact with it.

Fortunately, in this day and age, you do not even have to lift your eyes out of your IDE to write documentation. Any modern language today has an associated documentation language built into it. Java has JavaDoc, Typescript has TSDoc, JavaScript has JSDoc, etc. These documentation languages assist you in describing your class, function, variables, every aspect of your code as much as you want.

Even better, most modern IDEs have these documentation languages built right into them and have tooling that makes writing that documentation easier.

Here’s an example of using JSDoc:

/**
 * Move the selection to the "first" section.
 * @return {void}
 */
selectFirst() {
    const first = this.sections.find(section => section.startHere);
    this.select(first || this.sections[0] || null);
},

In this example the comment block that precedes the selectFirst() function describes what it is and what it returns.

Documentation languages vary by programming language, so please read up on your particular one.

One note about documenting that to address: Your code is not self-documenting. Many, many people have said this over the years and every single one of them has been wrong. What they are really saying is their code is Readable and that is enough. In some case this is true, but mostly it is not. Please add documentation to your code about what it does and how it is used.

Denouement

So that’s it. Above we outlined nine (9) separate ways to make your code more readable, more descriptive, more defensive, and more user friendly. It’s a lot, to be sure, but each of these approaches is a small step towards the larger goal of writing better code that is more than just runnable.

We recognize that this could be seen as a lot. We all know that software is a demanding job full of demanding bosses and customers who think it is super easy. And these best practices they take additional time, they cost cycles, and adding them can seem overwhelming. However, starting small is the key here. When considered as a whole, these rules can seem daunting. But individually, they are not so rough. Try adopting one or two of these at first, then a few weeks later add another, until they are all second nature. It’s a good bet you are probably already doing one or two of these without even thinking about it. So, add a few more, then a few more, and next thing you know you will be shipping code you are proud to have others to read.

permalink: http://www.arei.net/archives/302
Why Dependency Management Pisses Me Off

Yes, it’s true. Dependency Management Pisses Me Off. Jason van Zyl over at Sonatype needs to be kicked in the groin… repeatedly. (Sorry, I don’t really know Jason and it’s not nice to say such things, but I wanted to really hammer home the point. Jason, I apologize to your testicles.) Seriously, when did we become so amazingly lazy that saving a JAR file into our SVN repositories became a big deal?

Now, I don’t want you to just think I’m some raving lunatic out there on his soap box shouting into the wind despite the accuracy of this picture; I want to at least pretend that you are not going to scream “TL;DR” and actually read the damn posting, so here is why Dependency Management pisses me off. Feel free to reply back to me on Twitter (@areinet) and I will engage you in some spirited debate. And I promise I won’t hurt your testicles in the process.

1). Dependency Management adds unnecessary complexity. Are we not just talking about saving some files into our SVN repository after all? Why is that so hard? And who on earth thinks that writing and changing a pom.xml file is actually easier than this? Also, there are people who would say that you shouldn’t be committing built objects into SVN (or whatever) and that we shouldn’t waste disk space. To these people I say this: Disk space is cheap. Seriously, the going rate for a HD is around 9gb per $1 USD. I assure you, no matter how many JAR files your project needs this is incredibly cheap.

2). Dependency Management puts someone else in your build loop. When I build a project, I want to rely on as few people as possible to fuck things up. Yet Dependency Management injects a completely incalculable third party into your system, and that’s just for one dependency. Sure, that dependency is always there, but with external dependencies, your are practically begging for your build to break because John Bozo three countries away from you removed six bytes of code from that one project you were relying on. Now, of course, you shouldn’t be using LATEST in your dependencies, but I really don’t want to rely on the fact that our build people are smart enough to realize this. If I just committed the version I wanted to the repository, none of these problems happen.

3). Licenses Change. When your build person goes out there and changes a version number, do you think they actually read a license file? Let’s assume for the sake of reality that people are incredibly lazy… now, do you want to take the risk that so and so actually read the license? And that the license didn’t change? Seriously, it would take me about six seconds to change the license on some sub-project you use and then commit it. And suddenly the sub-project owns all of your IP simply because you used them. Now, the legality of that is a debate for other scholars than I, but it could certainly cause a mess. The only thing stopping a sub-project from doing this,is the hassle of suing your ass into oblivion… And we all know that people are getting more and more litigious every day.

4). The Internet is, by it’s nature, unreliable. Do you really want to rely on the fact that the internet providers upstream from you are not going to screw something up right when your absolutely must deliver build has to get run? Seriously, the more people that have input into a process, the more likely that process is to get derailed. I do not want to think that my ability to deliver is dependent on whether or not Anonymous is going to cause a world-wide outage in protest over SOPA (which sucks by the way). Sure, you can run a mirror of x repository and spend your time maintaining that as well, but wouldn’t you rather spend time, oh I don’t know, outside? With a girl? playing WOW? Doing anything else?

So the point here is this and this is the TL;DR for you lazy people as well… Dependency management adds both complexity and unpredictability to your systems and this is not a good thing. A Build process is about Rigor, and Dependency Management is antithetical to rigor. By using a Dependency Management solution you are willingly signing up for problems and extra work. Who wants that? When given the choice between that and just storing the files I need into my repository, I will choose the latter every time.

Now, I do think some dependency systems are way better than others. The node.js NPM system is amazingly clean, but it’s still begging for the problems I outline above. So, maybe not that awesome. It is easy to use though, wish Maven were half that easy.

So, that’s it. That’s why every time my coworker comes in and raves about how awesome Maven is I just point at his crotch and start laughing. I mean, really, Maven? Awesome? You got to be an idiot to think that. (My apologies to my coworkers.)

permalink: http://www.arei.net/archives/246
Developer Drift

Lately I’ve been the subject of what I call developer drift.

Developer Drift is the process by which an unchallenged developer slowly moves from one project to the next. The project may be in house or external or something completely fabricated by the developer’s mind, but it basically means a developer is less interested in the current project than the project over the horizon. It’s the “Grass is always greener” truism made concrete in software engineering.

For me, this has taken the form of the fact that our customer wants really boring user interfaces which I can crank out like they are nothing. Problem is, I almost never crank them out, because they are meaningless and I never feel challenged/creative by them. (For me challenged=creative.) So I take forever to implement them and tend to make a lot of excuses on why this is taking so long. I feel justified in why it takes so long in the fact that when I am challenged, I really do crank out the code at an extraordinary rate which borders on the obscene when compared to average developers. I’m very prolific when I want to be.

The same thing was true when I was back in college studying English Literature. I could crank out a paper that I found interesting in no time flat, but assign me something that was pedantic and I’d sooner rip my own teeth out with a spoon (“because it will hurt more”).

So the real question I’m trying to find an answer to is “How do you deal with developer drift?” How do you stop people from losing interest when they are bored because the project has become boring? A project I used to work on is suffering from this very problem… they want to keep the team together, but the more boring stuff they do, the less likely they are to be able to keep the team together? Is there a way for this project to challenge it’s developers at the same time as doing boring things? Would contests or “feats of skill” help keep things from getting stale? Or should they just accept this as the life cycle of the developer, assume that people are going to drift away, and prepare for the next generation?

You tell me… how does your project deal with developer drift?

permalink: http://www.arei.net/archives/213
Maven Vitriol

I’m an ANT guy. I’ve been using ANT since 1999. It’s amazingly powerful, amazingly useful and very flexible… and I’ve never once written my own ANT task. Just using the ANT tasks available or out there in the community I have been able to build hundreds of software projects.

Now, of course, everyone says, “Go Maven”. So I went Maven… and it sucked.

See, I have very strong feelings about Frameworks. (Not to confuse you here, but Maven, like ANT, is a Build Framework.) The problem with 99% of all Frameworks out there is that they force you into a specific way of doing something. “But that’s the whole point,” you scream at me. And I agree, that’s the point… until the moment you need to do something else.

Now, I use frameworks all the time in my software development, we all do. There’s one listed over on the right side in my links section that I actually endorse. So am I not hypocritical for deriding frameworks in one breathe while using them in another? Of course I am, but here’s where i’ll caveat it… I use frameworks that provide the least limitation on me doing new things. Maven, as an example is a rigid framework. Doing something new with it is difficult challenge. ANT on the other hand, while limiting, isn’t nearly as limited.

So here’s AREI’s Framework Measurement Testing System. First, evaluate a framework for the project you are working one, say building a Java Swing application. Next, evaluate the framework for another, completely different project you might work on in the future, say building a Website. How does the framework stack up for both things? Finally, consider what’s going to happen when you need to take the framework beyond it’s scope into new places. How accepting of that path is it?

Here’s some examples:

I had an ANT script that would append together a bunch of .JS files and then minify the entire set. Worked like a champ. The project I was on decided let’s give Maven a try instead of ANT. So I had to come up with a way to do the same thing in Maven. Two weeks of work later I ended up just calling ANT from inside of Maven.

Anyway, this post was really meant as just a simple link to an awesome blog posting that unleashes some much needed fury on Maven. But I got a little carried away in the intro.

So in summation, Maven sucks. Pick a framework that you can change. Damn the man! Save Empire!

permalink: http://www.arei.net/archives/193
Google Wave

Google announced yesterday a new offering called Google Wave.  There is an excellent article at TechChrunch that gives a complete overview.  All I can say is: THIS IS HUGE PEOPLE. The concept of Google Wave, the underlying idea is exactly the right step that the web and the internet needs to take. It’s a combination of Email, Instant Messaging, Twitter, Transparency, Collaboration, Wiki, Media sharing, and so much more.  Oh, and it’s an Open API and Open Source to boot.

If you’ve ever talked tech with me in the last three years and we’ve had the discussion about what is next in technology, then we’ve had a very similar discussion about the concepts behind Google Wave.  I’m not claiming Google has once again stolen my idea, but what I will claim is that there clearly is a need for this type of product that I saw and google saw and others saw as well.

Two things in particular I want to call your attention to that make Google Wave a huge idea:

1). Adhoc groups… What adhoc grouping really does for you is to let people create internet groups as easily as they create real world groups.  Think of it this way… when you walk into your work place break room and two other people walk in and the three of you start talking, you have created a group.  It’s fast in the real world, why can it not be like that in the internet world? For most of the internet when you want to form a group and start working together there’s a fairly large investment in creating the meta systems the group needs: setting up a mailing list, setting up forums, setting up a media store, setting up a user database, etc etc.  It’s a pain in the ass really, and it make setting up a group fairly non-trivial.  To some degree, Yahoo Groups and Google Groups automates a lot of that, but then you still have to go through the effort of getting people to join etc.

What Google Wave does is to make group creation simple and almost instantaneous.  Basically, you create a group by dragging a bunch of contacts together and off you go.  You can immediately begin discussion, expand the group, whatever.  Additionally, if you need other tools for that group like a map or a document or some other thing, you can add a widget or a robot to participate in that  group and boom, you have more functionality to fill the groups need.

2) The second feature that is huge for me is transparency.  Transparency is the notion behind Twitter in that you broadcast snippets of your activities out to the world and everyone can see what you are doing.  This is the beginning of transparency though.  To take it further you need to automate transparency such that as you do things, they automatically are published.  Of course, this brings up privacy concerns, but I think there are simple ways to solve this.

I know that Wave has some level of transparency, but it’s still to early to tell how much.  I suspect that even if this is an opt in version of transparency, like Twitter, that soon there will be robots and widgets (the extensions to Wave) that will automate a lot of this functionality.

The point is, though, that transparecny can be a huge social tool… but more importantly it could be a huge business tool.  And the company that sees this is going to get a huge edge over the company that does not.

Anyways, that’s my thoughts on Google Wave.  You should efinately check it out.

permalink: http://www.arei.net/archives/122
Minification

There has been a great deal of discussion lately about the value of minification, yet very little concrete value. To that end I have decided to set down my thoughts from these discussions and share them such that everyone can be on the same page. Below I will outline what minification is and what it gains you. This is followed by discussion of minification versus HTTP Compression. Next, I will look at concept of minification as obfuscation and the usefulness of obfuscating in general. Finally, I will share my recommendation on the subject of Minification and which tools I would recommend.

MINIFICATION

To start, lets define what minification is: Minify or Minification is the process of taking some source file and compressing it to a smaller size by removing whitespace and comments. A number of other small rules are sometimes applied as part of the minify process, but these are less significant. For example, Minify will shorten variable names to two or three characters and thus gain some space there.

Consider this simple function which is not minified:

for (var i = 0; i <; 100 ; i++)
{
    var randomnumber = Math.floor(Math.random() * i);
    document.write("A random number between 0 and " + i +
                   " is " + randomnumber);
}

When minification is applied this function becomes thus:

for(var i=0;i<;100;i++){var randomnumber =Math.floor(Math.random()*L);document.write("A
random number between 0 and "+i+" is "+randomnumber);}

(Please note that any line breaks you see in the minified code above are due to word wrapping in your mail reader. Minified code is generally always a single line.)

In essence, minification reduces overall code size, and thus reduces the clients download time. The downside of minification is that by removing whitespace and comments and renaming some variables, the code becomes unreadable, extremely hard to debug, and may introduce unforeseen and hard to find bugs.

When we are talking about Web applications both Javascript (JS) and CSS files can be minified.

In a small test case I put two files through the minification process to see what the net gain would be. The Javascript file is of small size with few comments. The CSS is a large CSS file with few comments.

FILE SIZE                                JS                    CSS             
Bytes                                    17954                 39020
Minified Bytes                           10413                 31178
Net Gain                                 42%                   20%

As can be seen minification is much more useful to Javascript where whitespace is used much more. CSS tends to have less whitespace and thus it's gains are lesser. Also, gains will be much larger if your code is heavily commented. Since comments are removed, minification gains are directly related to the amount of comments.

So, a 42% gain seems like a lot when you are talking about really slow network connections. Yet the question is, are those gains really worth the sacrifice in the ability to debug your code? And if one does not want to sacrifice the ability to debug and read the code, what can be done instead of minification?

HTTP COMPRESSION

HTTP Compression is a standard part of the HTTP 1.1 protocol that is in use in almost all major browsers and web servers on the market today. In essence whenever an HTTP request is made by a browser, if that browser supports HTTP Compression a special parameter is added to the outgoing request that lists the different compression algorithms that the browser can uncompress. When a Web Server sees this special parameter, it checks to see if the requested file can be compressed and if the server has one of the compression algorithms that the browser has said it can use. If all this is true, the server will run the compression algorithm over the response before it is sent back to the browser. The browser receives the compressed response and make it uncompressed before making it available.

HTTP Compression has many advantages and few disadvantages. Foremost, HTTP Compression is an entirely automated process that is handled by the server and requires only a minor configuration change on most Web Servers to enable. The Browser requires no special functionality to use compression. The disadvantage of compression is that there is a minor performance hit to both the server side and the client side. The server side must spend time compressing the response. The client side must spend time uncompressing the response. Yet, in most cases the file sizes involved make the time spent in compression minimal. If files sizes were significantly larger, compression might begin to cause performance problems.

Let us consider our test files again. This time let us look at our net gain from using HTTP compression only.

FILE SIZE                                JS                    CSS             
Bytes                                    17954                 39020
Compressed Bytes                         3906                  6708
Net Gain                                 78%                   82%

Right away its apparent that HTTP compression offers significant gains. When compared to our previous results from minification, HTTP Compression is the clear favorite. The reason for these significant gains that the HTTP Compression works across all the bytes of the source files whereas Minification largely leaves the meat of the CSS and JS files alone and works mostly on the whitespace of those files.

Given the gains of HTTP Compression the next logical step is to ask, what if I did both minification and HTTP Compression. The result is even greater improvements, but not as much as one might imagine. Consider our test files again:

FILE SIZE                                JS                    CSS             
Bytes                                    17954                 39020
Compressed & Minified                    2514                  5907
Net Gain                                 86%                   85%

Overall when employing both techniques, the gain of minification is not as significant. For our JS file there is only an 8% gain over just plain HTTP Compression. There is even less gain when looking at the CSS file. However, please remember that both the JS and CSS files in our test are very light on comments. More comments will increase the gains somewhat, but less than one might think.

Given these results, the question to be asking is does the sacrifice of being able to debug and read my Javascript and CSS outweigh my need to gain an additional 8% over just using HTTP Compression.

So maybe Minification is not the way to go after all, but what about Minification as a means of protecting my code from people whom want to steal my hard work?

OBFUSCATION

In the world of interpretive computer languages such as Java, C#, Javascript, PHP, Groovy, etc there has long been the question of how one can prevent someone from reading or stealing their code. The more openly accessible the language, the easier it is to read and copy it.

Obfuscation, is the process of applying a series of heuristics to code that will make it harder to read and more confusing to understand. The intent is that by applying these rules, the code becomes such an confusing mess that nobody would bother to read the code.

Java Obfuscation is a fairly complex process. There are hundreds of heuristics that are applied to the incoming source code. The resulting end code is a nightmare of crazy classes, method and members names that make scanning the code painful.

Obfuscation for Javascript is much less powerful. Because Javascript is meant to be a widely open language, the ability to obfuscate the code is minimal. Variables, objects and methods in Javascript have no scope privacy and thus can be accessed by anyone and are often designed that way. This means that their names cannot be obfuscated because the names have relevance to the structure. You cannot change the name of a given method, because in many cases there is no way to tell whom has to call that method or where in the code that might happen.

Yet the biggest problem of all with Obfuscation is that it is almost completely worthless. Obfuscation will only stop the most basic of people from copying your code. Extracting obfuscated code is fairly simple in most interpreted languages. With many of today's sophisticated IDEs reformatting and stepping through code is fairly simple. Additionally, many IDEs support variable name matching such that when an obfuscated variable name is selected all of the matching uses are highlighted. All of this means taking obfuscated Javascript back to readable code is far easier than one might suspect.

Another big strike against obfuscation is that Javascript can be employed in very diverse and powerful ways. Closures, functions as objects, objects as associative arrays, and the like, can all lead up to some fairly complex Javascript. Often times this kind of programming can confuse an obfuscator that is not designed to handle the diversity of the language. Even when writing this email I managed to break one obfuscator with just the sample for loop code from above.

Obfuscation, at its best, serves to help keep honest people honest. It is not going to stop someone whose intent is to copy your code, it's not even going to put up a decent struggle.

Minification, while not technically an obfuscator, does provide some obfuscation like processes. The removal of whitespace and the collapsing of localized variables all makes the code harder to read. Yet, not that much is really changed otherwise. Consider our sample code block that we looked at earlier. With minification obfuscation, it is still fairly readable:

for(var L=0;L<;100;L++){var dS0=Math.floor(Math.random()*L);document.write("A random number between 0 and "+L+" is "+dS0);}

Given this code and about 2 minutes on the internet or a few search and replace calls and you can turn it back into fairly readable code as shown here:

for (var L = 0; L <; 100; L++) {
    var dS0 = Math.floor(Math.random() * L);
    document.write("A random number between 0 and" + L + "is" + dS0);
}

It's not a perfect match of our original code, but it is fairly close and takes little real work to achieve.

Ultimately, Obfuscation can have some utility in giving you a very basic first line of defense, but the ability to debug and read your code is reduced and in some instances obfuscation can add to code size, albeit a very small amount. These limitations must be carefully considered before undertaking obfuscation. What are you really trying to do when you obfuscate your code?

Unfortunately, there is no way to really protect your Javascript code on the internet. The design of HTML, Javascript, and CSS is all meant to be plain text readable formats and this means that absolutely nothing stands between the site code and the end user. The only real protection you have from people reading and copying your code is in how much the worry about your legal recourses.

CONCLUSIONS

So where does this leave us with regards to Minification?

Well, we showed that from a size compression stand point, minification has some small value, but not nearly as much value as just turning on HTTP Compression. From an obfuscation standpoint we outlined the fundamental weakness of obfuscation in general. Given these factors my recommendation is for careful consideration of whether or not minification is actually needed in the particular case you are making.

I would ask myself these questions:

1). How important is the ability to read and debug my code?

2). How much time and resources do I wish to invest in tracking down bugs in minified code?

3). How relevant is the commenting in my code to the ability to debug and read my code?

4). With Obfuscation, whom am I trying to protect my code from and how competent are they?

Ideally, I think the best application is to do no Obfuscation, HTTP Compression, and a reduced form of Minification in which only comments are removed from the code. This provides you with high compression and still maintains the ability to read and debug your code. In my opinion obfuscation really is not worth the effort to employ it for the benefit you receive from it.

CONFIGURATION NOTES

HTTP Compression can be turned on in either Tomcat of Apache Web Server with just a little bit of effort. Tomcat requires a mere 3 lines of code to get HTTP Compression working. Apache HTTP Server requires a few more lines, but it is still relatively easy. For more information about configuring tomcat see http://viralpatel.net/blogs/2008/11/enable-gzip-compression-in-tomcat.html. For more information about configuring Apache Web Server see http://httpd.apache.org/docs/2.0/mod/mod_deflate.html.

TOOLS NOTES

Finally, I will plug the YUI Compressor. There are a lot of Minification programs out there, but YUI Compressor seems to allow for the most configurability. Also, I liked YUI Compressors ability to minify both CSS and JS files within one simple program. You can find out more about YUI Compressor at http://developer.yahoo.com/yui/compressor/

permalink: http://www.arei.net/archives/97
Free is the new Black

So, being a Java guy I’m fairly religious to Sun as a company.  I invested in their stock a while back (for good or bad) and thus I always kind of keep an eye on the company.  Recently I came across a series of video blogs by Jonathan Schwartz, the CEO of Sun.  The one I want to share with you is his second video blog.  The blog talks about the Sun view of the technological market place, and how giving brands away for free (like Java and MySQL and others) drives adoption, which in turn can drive revenue. Now, I’m no corporate financial officer, just a code monkey, but I found the blog well thought out and I thought I would share it with you.

My own company is currently working on a new project with the goal of productizing it.  We’re still in the early development of this product and haven’t had the discussion about how to sell it to consumers yet.  However, as I go about building the User Interface for this site I cannot help but keep thinking to myself (even prior to reading Jonathan’s blog) that the value of the site can only be realized by driving users to the site (or as Schwartz put it: building adoption).  In order to achieve this, the site would need to be largely free to the masses and make its revenue either through some ancillary stream or by charging only for some specific “higher role” usage.  As I designed the front end and go about coding it, I keep telling myself that the site is meant to be used by millions of people freely and must scale accordingly.

Like I said, I’m not the business guy, just a code monkey, but I really can see this notion of giving something away and finding revenue beyond just usage.  I like to think of it as the Field of Dreams model… In the movie Ray, the main character, is told to “Build it and they will come” which is much like the philosophy Schwartz is espousing.  The existence of the Field of Dreams brings the people, and there is surely revenue to be made after the fact: selling popcorn or something; the method is not as important as getting the people, driving adoption.  If you have the people, selling them something they want beyond what you are offering should be easy.

Anyway, if you’d like to read/watch the video blog I am talking about, you can find it here: http://blogs.sun.com/jonathan/date/20090306

If you’d like to read/watch the entire video blog series (there are four of them), you can  find it here: http://blogs.sun.com/jonathan/date/20090302

If you’d like to follow Jonathan Schwartz’s blog, you can find it here: http://blogs.sun.com/jonathan/

permalink: http://www.arei.net/archives/84